Investigating the Usefulness of Generalized Word Representations in SMT
نویسندگان
چکیده
We investigate the use of generalized representations (POS, morphological analysis and word clusters) in phrase-based models and the N-gram-based Operation Sequence Model (OSM). Our integration enables these models to learn richer lexical and reordering patterns, consider wider contextual information and generalize better in sparse data conditions. When interpolating generalized OSM models on the standard IWSLT and WMT tasks we observed improvements of up to +1.35 on the English-to-German task and +0.63 for the German-to-English task. Using automatically generated word classes in standard phrase-based models and the OSM models yields an average improvement of +0.80 across 8 language pairs on the IWSLT shared task.
منابع مشابه
A Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملOn the Integral Representations of Generalized Relative Type and Generalized Relative Weak Type of Entire Functions
In this paper we wish to establish the integral representations of generalized relative type and generalized relative weak type as introduced by Datta et al [9]. We also investigate their equivalence relation under some certain conditions.
متن کاملOn generalized reduced representations of restricted Lie superalgebras in prime characteristic
Let $mathbb{F}$ be an algebraically closed field of prime characteristic $p>2$ and $(g, [p])$ a finite-dimensional restricted Lie superalgebra over $mathbb{F}$. It is showed that anyfinite-dimensional indecomposable $g$-module is a module for a finite-dimensional quotient of the universal enveloping superalgebra of $g$. These quotient superalgebras are called the generalized reduced enveloping ...
متن کاملImproving Statistical Machine Translation Using Word Sense Disambiguation
We show for the first time that incorporating the predictions of a word sense disambiguation system within a typical phrase-based statistical machine translation (SMT) model consistently improves translation quality across all three different IWSLT ChineseEnglish test sets, as well as producing statistically significant improvements on the larger NIST Chinese-English MT task— and moreover never...
متن کاملUsefulness of Serum NT-proBNP in Diagnosis of Generalized Seizures in Egyptian Children
Background Seizures may occur in as many as 1% of children. The most urgent type of seizures is generalized tonic-clonic seizures (GTCS). N-terminal prohormone of brain natriuretic peptide (NT‐proBNP) has been considered as a promising biomarker in numerous acute illnesses. We aimed to evaluate usefulness of NT‐proBNP for diagnosis of g...
متن کامل